Studiegids

nl en

Text Mining

Vak
2022-2023

Admission requirements

Assumed prior knowledge

A Bachelor in AI or Computer Science is recommended for this course, as well as experience with programming in Python.

Description

Text mining, also known as 'knowledge discovery from text', is a research and development field that has gained increasing focus in the past two decades, attracting researchers from data science, natural language processing, and machine learning. Key applications are text categorization, information extraction, social media mining and automatic summarization. This course gives an overview of the field from both a theoretical angle (underlying models) and a practical angle (applications, challenges with data). In addition to the lectures, the students work on practical assignments.

Outline:
week 1. Introduction
week 2. Text processing
week 3. Vector Semantics
week 4. Text categorization
week 5. Data collection and annotation
week 6. Neural NLP and transfer learning
week 7. Information Extraction
week 8. Text summarization
week 9. Sentiment analysis
week 10. Biomedical text mining
week 11. Industrial Text Mining
week 12. Conclusions

Course objectives

After successful completion of this course, students have an understanding, both at the conceptual and the technical level, of natural language processing (NLP) methods for the purpose of text mining. Students can build models for a text mining task using machine learning algorithms and text data, and they can evaluate and report on the developed models and modules. Also, students understand, from a theoretical perspective, which models are applicable in which situations, and which real-world challenges prevent the application of certain techniques, such as language variation, labelling noise, and noise due to document processing errors.

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

Lectures, literature, assignments (no lab sessions).

Assessment method

  • a written individual exam, closed book (50% of course grade)

  • practical assignments in groups (50% of course grade)

    • two assignments (10% each) during the course
    • one more substantial assignment (30%) at the end of the course

The grade for the written exam should be 5.5 or higher in order to complete the course. The average grade for the practical assignments should be 5.5 or higher in order to complete the course. If one of the tasks is not submitted the grade for that task is 0. Each assignment has a re-sit opportunity (a later submission). The maximum grade for a re-sit assignment is 6.

The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Reading list

The literature will be distributed on Brightspace. The majority of the chapters come from this book: Dan Jurafsky and James H. Martin, Speech and Language Processing (3rd ed), December 2021 https://web.stanford.edu/~jurafsky/slp3/

Registration

From the academic year 2022-2023 on every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.

Extensive FAQ's on MyStudymap can be found here.

Contact

Lecturer: dr. S. Verberne
Website: Course website

Remarks

Due to limited capacity, external students can only register after consultation with the programme coordinator/study adviser mastercs@liacs.leideuniv.nl.